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In this study paper, the feasibility of constructing a complete smart system for 
anticipating electrical power consumption is created, as electricity's market 
share is expected to expand over the future decades. Smart grids and smart 
meters will help utility companies and their customers soon. New services and 
businesses in energy management need software development and data 
analytics skills. New services and enterprises are competitive. The project's 
electricity consumers are categorized by their hourly power usage percentage. 
This classification was done using data mining (five algorithms in specific) 
and data analysis theory. This division aims to help each group minimize 
energy use and expenditures, encourage energy-saving activities, and promote 
consumer involvement by giving tailored guidance. The intended 
segmentation is done through an iterative process using a computer 
classification computation, post-analysis, and data mining with visualization 
and statistical methodologies. 
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1. INTRODUCTION 

Power efficiency was gaining importance in the power world’s priorities, it is named as the “invisible 
fuel” stating that the greatest choice is not to waste power. In areas like Europe with extremely external power 
supply dependences, an optimization at each stages in the power chain is a must from both ecological as well 
as economic point of viewpoint as mentioned in [1]. Laws as well as obligations set with the power efficiency 
directive are in that path. A target of the European Union is to reduce up to 20% the energy consumption, 
achieving by 2022 an energy consumption lower than 1.474 of primary energy or less than 1.078 of final 
energy, but setting an objective adequate to each country characteristics [2]. In addition, small and medium- 
sized plants built primarily using renewable power sources (PV and wind) or advanced techniques like fuel 
cells [3] produce close to where they are consumed, which prioritizes demand by more efficiently utilizing 
resources by storing energy that can be used when and where needed thanks to a bidirectional network, taking 
advantage as well as the data mining to shave consumption peaks as well as engage a customer, who will have 
an active role as mentioned in [4], as shown in Figure 1. 

For the existing work, a special attention is given to the smart meters and to the advanced metering 
infrastructure using the electrical energy consumption using data mining and analysis, as from them the 
consumption data generated [5]. This infrastructure is still under development but many regulators and 
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governments put the focus especially to the smart meter deployment. It is the case of the European Union [6], 
inside the energy strategy for 2022 program. Electrical markets were heavily regulated in some countries, fact 
that difficult the implementation of new technologies and quick changes to the conventional market scheme, 
due to its rigidity. This has arisen the implementation of private smart meters mainly to the industry, 
commercial and office building; as allows an energy management with future economic benefits as mentioned 
in [7]. However, in the residential sector the energy savings are not compensating the investment on private 
sub-metering equipment, and so relying on the “official” smart meter implementation to start managing their 
energy consumption as mentioned in [8]. Operators perform the ongoing management functions, typical 
operations may involve routing and switching, fault management, real-time network calculation, operational 
statistics a well as sreporting, and dispatcher training are all terms used to describe various aspects of running 
a network. They have a role to smooth the process of the energy system, many to these tasks are accountability 
of a regulated utility such as the maintenance and construction, meter reading or security management; and 
some of them can be outsourced to other domains as discussed in [9]. 
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Figure 1. The scheme for estimation of buildings electrical energy consumption [4] 


Cuerda et al. [10] suggests a further investigation on the households’ features and householders’ 
characteristics is required to be able of determine a possible-causes and the origin of the load patterns and 
whether exists or not any correlation with the households’ features and the load patterns; with so it would be 
possible to provide more accurate and detailed energy-savings recommendations. For example, the load profile 
it won’t reflect the same whether the kitchen is electric or gas; also, whether the hot water boiler is electrical 
or gas; due to the fact that it they are electric the peaks are likely to be visible. The base consumption leads to 
a lower consumption that a house has at all times, especially at night and during times when there is no activity 
at home, as seen for example in [11]. 

Also, nowadays the time-of-use tariffs allows to set different prices each hour, the utilities in order to 
shave the peaks set higher prices to the peak time, usually the evening. So, again a behavioral change is needed 
in order to reduce the peaks, the simultaneity coefficient of devices used at the same time should be reduced. 
For instance, use the washing machine at night, out of peak hours that have been rather classified with known 
machine learning (ML) algorithms in previous study [12], shown Figure 2. The usage and deployment of 
different model in science. 

Cogeneration plants (electricity and heat production), region heating plus cooling, the integration of 
renew power using data mining and data analysis are the principal areas of attention for energy efficiency in 
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energy supply. As a result of provider's obligations as well as the use of white licenses [13], energy savings 
were achieved. Energy infrastructure modernization contributes to a last deployment of intelligent networks, 
lowering network failures (nearly 30%) from power generation, transit as well as distribution, which is 
achieved through the modernization of the infrastructure. From the perspective of the present work, the main 
focus of attention is on the energy efficiency directive, which is related to the energy efficiency in energy use. 
Aiming for a massive smart meter’s rollout (72% in electricity by 2022) and allowing consumers to have access 
to real-time and historical information of their consumption and billing; will provide inputs and knowledge to 
the end consumers that will enable them to have an active role to decide how and when is the best time to use 
energy. Optimizations in addendum to the power efficiency directive utilizing random forest, decision tree, 
support vector machine (SVM), logistic regression as well as Naïve Bayesian, there are many other directives 
devoted algorithms on building’s thermal plus electrical demand, like buildings signify about 40% of the total 
power request in world these directives are: 

— Power performance to buildings directive: involve building power certificates when retailing as well as 
renting buildings; business renewal of building components to a small power need, each new buildings 
should be near-zero power buildings. 

— Power labelling directive: meaning to assist users to take more data to select power efficient results (air 
conditioners, television, washing machines, and lights). 

— Eco-strategy directive: directed at product producers, requiring manufacturers to create minimum power 
efficiency standards for their product. 


Data Science 


Figure 2. The usage and deployment of different model in science [12] 


In contrast, with the data mining and data analysis is not just delivering the service as well involving 
other features that enhance the whole process. This directive differentiates the power effectiveness in power 
supply and power efficiency in power use. The measures adopted can be summarized: 

— Power suppliers and retailers must decrease annually 1.5% their power consumption using data mining and 
data analysis. 

— Annual power efficient restoration of at minimum 3% of buildings owned or occupied. 

— Incentive the buildings renovation, i.e. adding insulation, double glaze windows, high efficient boilers; to 
enhance their power performing with data mining. 

— Mandatory power performing licenses when rental or selling buildings. 

— Set of algorithms or standards for a range of methods, like decision tree, random forest, logistic regression 
as well as Naive Bayesian. 

— Regular power audits for major corporations plus inducements for tiny and middle-sized businesses to 
conduct data mining-based energy audits. 

— In order to properly control energy use, consumers' rights to full information, real-time and historical access 
to power consumption and billing data must be protected. 

— By 2022, there will be a total of 200 million smart meters installed (72% of the total). 
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2. METHOD 

Providing a solution that allows each individual consumer to see a visual depiction and fulfills all the 
desired functions is intriguing. This facilitates the comparison among consumers, permits to access easily to 
the individual electric consumption main characteristics that, for instance, facilitate a quick audit on its 
consumption and detect possible irregularities or problems. Among these purposes are: 

— Differentiate the weekdays and weekends load profile, are compare them to the load profile output 
accounting all days not differentiating weekdays and weekends. 

— Differentiate the load profile for each day of the week (Monday-Sunday), to know if among the weekdays 
the load profiles are similar or not (Friday may differ from Monday-Thursday). 

— Representation in absolute values, percentage of consumption per hour, accumulated in a period. 

— Study if there is a consumption difference among months and seasons. 

— Find the characteristic load profile per each month of the year. 

— Be able to specify exact dates and plot the consumption at that period. 

— Using different kinds of visualization graphs shapes (bars, points, lines, and boxplots). 

— color to facilitate the information visualization. 

— Facilitate the comparisons between users by using interactive graphs. 

A variety of methods can be employed to group electrical load patterns, each with its own unique 
approach to achieving the same result. In that sense, for the sake of the current analysis the considered methods 
are; i) decision tree, random forest, ii) Naive Bayes classification, and iii) SVM. Figure 3 explain the algorithm 
with variable responsibility, data regarding the electricity consumption is the main one used in the later 
analysis; it is numerical data as its records the measurements of electrical consumption for every household 
participating into the project [14]-[24]. States of the data when performing a data analysis before classification 
as shown in Figure 4. When analyzing the data, there are three things to keep in mind: 

— Itis necessary to reshape the dataset to provide a desired graphical output in the majority of circumstances. 

— The days’ treatment is a challenge when traveling the data, as time series information requires further and various 
approaches. Particularly, if division among weekdays and weekends, or monthly distinction is required. 

— They focus on the specifics of electricity consumption data, analyzing which is the best approach to get to 
the information or conclusions you are looking for 2.1 clustering analysis. 

The similarity characteristics of the data are not known in advance. The dataset is divided into clusters 
or groups. Objects within the similar group have like properties of each another; and differ from instances of 
other groups. It is an unsupervised learning, due to the absence of a training dataset able to provide prior 
knowledge. The objective is to label or group the observations, according to their similarity. 
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Figure 3. The data mining algorithm with variable responsibility [25] 
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Figure 4. States of the data when performing a data analysis before classification 


2.1. Classification analysis 

Exists a training dataset that was used to previously to subset this data in groups. New data is classified 
based on the training dataset. Algorithms find the group to which each new object belongs to due to its common 
characteristics. It is a supervised learning task due to the existence of a training statistical dataset, that has 
labelled or grouped the data previously. The objective is each new data object into an existing group as shown 
in Figure 5. 


Clustering Classification 


Figure 5. Clustering and classification grouping in data mining while clustering on the left and 
classification on the right [25] 


2.2. Data mining techniques 

In this section present the data mining techniques in several parts as support vector machines, random 
forest, decision tree, Naïve Bayes and logistic regression. Predicting electricity usage is our primary goal, and 
this can be broken down into two parts: classification models based only on energy consumption data may 
accurately detect whether a residence is occupied; prediction models based on occupancy detection data can 
accurately predict whether a residence is occupied for a smart system. 


2.2.1. Support vector machines 

The SVM is a ML as well as data mining algorithm was to find the most reliable indicators of how 
much energy a person uses. Classification methods like increasing trees and general additive simulations were 
employed to help us find an answer to our inquiry. Using forward, backward, and best subset collection, we 
were able towards identify the subset of predictors that most strongly correlated with utilization. Recursive 
binary splitting of the predictor space was used by the SVM as a tree-based method for stratifying the predictor 
area into test regions. The study chose to utilize the increasing tree technique as this is well-known to be one 
of the extremely strong tree established types. SVM as well have a great ability to contract with high-level 
dimensionality data. In this work, three types of SVM models were used: linear kernel, radial (or Gaussian) 
kernel and polynomial kernel as shown in Figure 6. 
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Figure 6. Sample of two-class linear SVM classifier [14] 
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2.2.2. Random forest 

In the random forest, each predictor's mean response is calculated. In the random forest, each response 
is added to the total sum of the space each answer happened from the mean of every predictor for a sum. 
Individuals with a high distance value are those that regularly deviated from the mean on each survey. Calculating 
the mode of each response made finding the frequencies of people who repeatedly classify the samples simple. It 
was considered a high-energy consumption response if a response's mode was more than 90% of all inquiries 
asked. There are a lot of responses to this question posted. Upon closer examination, it became clear that all the 
respondents had used the identical response. The random forest model incorporates these responses, phases of 
ensemble random forest approaches to solve classification problems as shown in Figure 7. 
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Figure 7. Phases of ensemble random forest approaches to solve classification problems [15] 


2.2.3. Decision tree 

Depended on a decision-enhancing machine, a decision tree method is used to discover the most 
significant predictors of energy consumption by analyzing the energy consumption component. To accomplish 
this, we used the decision tree method that was mentioned in the document. Trees are built utilizing knowledge 
from the preceding one so that the model improves over time. The amount of trees, shrinkage factor, and the 
amount of splits in every single tree are among the tuning parameters as shown in Figure 8. 
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Figure 8. Decision tree algorithm structure [16] 


2.2.4. Naïve Bayes 
The output variables of subset selection, the factors having a greatest relative effect in classifying, and 
a combination of variable quantity from both generalized additive and Naive Bayesian models can all be 
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compared using Bayes. It compares the precision of each top model's predictions in order to make a direct 
comparison. With the use of a variety of splines, second grade polynomials, as well as linear predictor variables, 
it was able to better understand the link between individual predictors and the answer. Predictors that need 
being found to get nonlinear correlations together along with our reply variable have being given more 
polynomials and splines as shown in Figure 9. 


Figure 9. Naive Bayes process on left with respect to SVM on right side for classification building [17] 


2.2.5. Logistic regression 

When preprocessing a logistic regression information, it observed that there were an amount of lost 
components distributed out evenly throughout our data collection. When removing all observations with any 
missing data, it lost about 30% of our dataset. This is because the missing data was not localized to a few 
variables or predictors for energy consumption. Since our data set is of a relatively small size, it decided that a 
better approach was to replace missing non-categorical data points with the average of that column, with the 
main purpose to keep as much data as possible. The data has a small variance, so adding averages does not 
heavily affect the overall statistics. 


3. RESULTS AND DISCUSSION 

In this work, the geometrical and statistical center of each class is computed first, and the distance 
between the two classes equals the distance between the two centroids for different types of classification 
algorithms as shown in Figure 10 overall classification results-based data mining approaches used on statistical 
dataset. The output results used to carry out the classification analysis vary depending on the specific aim of 
the load profile’s energy consumption segmentation plus classification using decision tree classification, and 
it is up to the analyst to decide which are the most convenient input data units for output results; the use of 
absolute values in result including number of records, precision, recall, F-measure, accuracy and time for the 
use dimensionless data with time factor for generating and execution is absolute 0.43 seconds with recorded 
precision of 0.968 for data analysis as shown in Figure 11 consumption prediction accuracy results based on 
data mining approaches. 

The output results used to carry out the classification analysis vary depending on the specific aim 
of the load profile’s energy consumption segmentation plus classification using SVM classification, and it is 
up to the analyst to decide which are the most convenient input data units for output results; the use of 
absolute values in result including number of records, precision, recall, F-measure, accuracy and time for the 
use dimensionless data with time factor for generating and execution is absolute 0.99 seconds with recorded 
precision of 0.906 for data mining. In discussion section, after analyzing these features, they could detect the 
group of households with large energy savings potentials, mainly due to the high number of electrical 
appliances. This concluding that the characteristics that were more relevant to the define a consumer with 
high savings potential are: type of occupant’s employment, number of adults/children, kind of space heating, 
kind of domestic hot water, heating, overall amount of home uses, and dwelling construction year. This data 
is not available in the current study scenario, as the features' data is significantly more constrained in terms 
of samples and attributes. However, while this work describes data related to the household and householders, 
this data is not complete and must be further processed to eliminate inaccurate and redundant information. 
Refined data can be evaluated using histograms, which depict the results and serve as the basis for drawing 
conclusions from them, comparison of accuracy achieved for all data mining techniques shown in Table 1 
and comparison of proposed technique with existing literature shown in Table 2. 
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Figure 10. The overall classification results-based data mining approaches used on statistical dataset 
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Figure 11. The consumption prediction accuracy results based on data mining approaches 


Table 1. The comparison of accuracy achieved for all data mining techniques 
Trained Household Logistic regression Random forest SVM Naïve Bayesian Decision tree 


1 90.15 86.67 89.82 84.68 89.86 
2 89.91 89.56 88.5 81.69 85.73 
3 90.15 87.69 90.15 83.46 90.15 
4 90.15 89.10 90.05 84.52 89.96 
5 91.08 88.74 89.77 85.71 90.16 
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Table 2. Comparison of proposed technique with existing literature 


Article Domain Research share (%) 
[18] Machine learning 41.71 
[19] Big data analysis 13.56 
Proposed Data mining 44.73 


4. CONCLUSION 

The purpose of the study work is to study the feasibility to developing data mining and data analysis- 
based system for predicting the electrical consumption on general-purpose-processor. The study is a sample of 
the general management pioneer inventiveness to attract the users plus foster the electric power efficiency and 
consumption among them, pointing to provide power knowledge, understanding plus guidance to the user in 
order to decrease its consuming using data mining and data analysis. To be able to produce this type of study 
in the huge level the arrival to the data as of the classification algorithms are vital. As a result, the combination 
of the intelligent meter implementation and the analytics data are known as to play a vital on the power sector. 
The data mining techniques create great quantities of raw data that require to be achieved and, once studied 
can be transformed into valuable data that advantages both the usefulness and the customer, as aim to enhance 
the client engagement and the value of the serving. Creating new business opportunities, mainly related to data 
mining and data analysis in response to the market needs. 
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